Creating synthetic temporal document collections
نویسندگان
چکیده
In research in temporal document databases, large temporal document collections are necessary in order to be able to compare and evaluate new strategies and algorithms. Large temporal document collections are not easily available, and an alternative is to create synthetic document collections. In this paper we will describe how to generate synthetic temporal document collections, how this is realized in the TDocGen temporal document generator, and we will also present a study of the quality of the document collections created by TDocGen.
منابع مشابه
Creating Synthetic Temporal Document Collections for Web Archive Benchmarking
In research in web archives, large temporal document collections are necessary in order to be able to compare and evaluate new strategies and algorithms. Large temporal document collections are not easily available, and an alternative is to create synthetic document collections. In this paper we will describe how to generate synthetic temporal document collections, how this is realized in the T...
متن کاملIndexing Techniques for Temporal Text Containment Queries
Many information management systems maintain multiple time stamped versions of documents. The archives of web pages, version control systems, wikis and backup mechanisms are examples of such systems. For such temporally versioned document collections, a search using keywords along the temporal dimension is valuable. This paper studies the temporal dimension of keyword search in the context of t...
متن کاملInteractive Demo: Stay in Touch with InfoVis – Visualizing Document Collections with Document Cards
Large document collections are essential resources for a wide variety of professionals, like scientists, lawyers, analysts, etc. An electronic document management system can assist them in solving the tedious tasks of curating, browsing, searching, and recognizing documents in these collections. As an initial step in creating such a system, we invented the Document Cards [3] as a mixed image-te...
متن کاملMining Association Rules in Temporal Document Collections
In this paper we describe how to mine association rules in temporal document collections. We describe how to perform the various steps in the temporal text mining process, including data cleaning, text refinement, temporal association rule mining and rule post-processing. We also describe the Temporal Text Mining Testbench, which is a user-friendly and versatile tool for performing temporal tex...
متن کاملDIGITALHISTORIAN: Search & Analytics Using Annotations
Born-digital document collections contain vast amounts of historical facts and knowledge. However, manual assessment of these large text collections is infeasible. In this paper, we demonstrate a retrieval system, DIGITALHISTORIAN, that analyzes these document collections using semantic annotations in the form of temporal expressions and named entities linked to a knowledge graph. For queries a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004